Abstract Questions:

Through these questions, we aimed to uncover patterns that link demographics to employment rates in Canada. The analysis in our paper highlighted broader findings about long-term trends and seasonality in employment, as well as significant differences in patterns between provinces and across sexes over time. The code below addresses some, but of course not all, of our questions.

Data

Subsets

Example: Ontario

##     month             variable        sex Ontario
## 1 1976-01           Employment Both sexes  3707.4
## 2 1976-01           Employment    Females  1426.7
## 3 1976-01           Employment      Males  2280.7
## 4 1976-01 Full-time employment Both sexes  3209.1
## 5 1976-01 Full-time employment    Females  1084.1
## 6 1976-01 Full-time employment      Males  2124.9

Total

Employee Count

Proportion of Workforce

Part Time

General

Employee Count

Proportion of Total Part-Time Workers

Full-Time

Employee Count

Proportion of Total Full-Time Workers

Female

Employee Count

Rate of Female Workers per Province

Male

Employee Count

Rate of Male Workers per Province

Comparison by Sex

Constrained Analyses

Granular (Ontario F)

Part-Time

  1. Proportions
  1. What proportion of part-time employees in Ontario are female?

  2. What proportion of all Ontario female workers are part-time?

  3. What proportion of all workers in Ontario are both female and part-time?

  1. Summary statistics

  2. Plots

## [1] 1.152995
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6140  0.6695  0.6867  0.6848  0.7017  0.7307

## [1] 1.350914
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2335  0.2584  0.2668  0.2664  0.2741  0.2981

## [1] 0.8670492
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.09144 0.11717 0.12285 0.12113 0.12795 0.13643

Full-Time

  1. Proportions
  • What proportion of full-time employees in Ontario are female?

  • What proportion of all Ontario female workers are full-time?

  • What proportion of all workers in Ontario are both female and full-time?

  1. Summary statistics

  2. Plots

## [1] 0.1252106
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3325  0.3809  0.4092  0.4059  0.4352  0.4480

## [1] 1.348098
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7018  0.7259  0.7332  0.7336  0.7416  0.7665

## [1] 0.3362953
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2861  0.3180  0.3343  0.3335  0.3521  0.3640

Modeling (Ontario F/Part-Time)

  1. Trend strength

  2. Seasonality strength

  3. Plots

## [1] 0.8959216
## [1] 0.01131532

##            Jun       Jul       Aug       Sep       Oct       Nov
## 2019 0.6348039 0.6139722 0.6253569 0.6381162 0.6507038 0.6363181

##       ** Teraesvirta's neural network test  **
##       Null hypothesis: Linearity in "mean" 
##       X-squared =  6.666735  df =  2  p-value =  0.03567277 
## 
##       ** White neural network test  **
##       Null hypothesis: Linearity in "mean" 
##       X-squared =  10.93809  df =  2  p-value =  0.004215253 
## 
##       ** Keenan's one-degree test for nonlinearity  **
##       Null hypothesis: The time series follows some AR process
##       F-stat =  0.003973227  p-value =  0.9497725 
## 
##       ** McLeod-Li test  **
##       Null hypothesis: The time series follows some ARIMA process
##       Maximum p-value =  0 
## 
##       ** Tsay's Test for nonlinearity **
##       Null hypothesis: The time series follows some AR process
##       F-stat =  0.7983433  p-value =  0.9278586 
## 
##       ** Likelihood ratio test for threshold nonlinearity **
##       Null hypothesis: The time series follows some AR process
##       Alternativce hypothesis: The time series follows some TAR process
##       X-squared =  29.481  p-value =  0.2106931
## $Terasvirta
## 
##  Teraesvirta Neural Network Test
## 
## data:  ts(time.series)
## X-squared = 6.6667, df = 2, p-value = 0.03567
## 
## 
## $White
## 
##  White Neural Network Test
## 
## data:  ts(time.series)
## X-squared = 10.938, df = 2, p-value = 0.004215
## 
## 
## $Keenan
## $Keenan$test.stat
## [1] 0.003973227
## 
## $Keenan$df1
## [1] 1
## 
## $Keenan$df2
## [1] 385
## 
## $Keenan$p.value
## [1] 0.9497725
## 
## $Keenan$order
## [1] 16
## 
## 
## $McLeodLi
## $McLeodLi$p.values
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 
## 
## $Tsay
## $Tsay$test.stat
## [1] 0.7983433
## 
## $Tsay$p.value
## [1] 0.9278586
## 
## $Tsay$order
## [1] 16
## 
## 
## $TarTest
## $TarTest$percentiles
## [1] 24.8139 75.1861
## 
## $TarTest$test.statistic
## [1] 29.481
## 
## $TarTest$p.value
## [1] 0.2106931

## ETS(M,N,N) 
## 
## Call:
## ets(y = prop_PT.f_ontario.train)
## 
##   Smoothing parameters:
##     alpha = 0.6392 
## 
##   Initial states:
##     l = 0.693 
## 
##   sigma:  0.0112
## 
##       AIC      AICc       BIC 
## -1539.180 -1539.122 -1527.066
## [1] -0.9999242
## ETS(A,N,N) 
## 
## Call:
## ets(y = prop_PT.f_ontario.train, lambda = 1.106443)
## 
##   Box-Cox transformation: lambda= 1.1064 
## 
##   Smoothing parameters:
##     alpha = 0.6393 
## 
##   Initial states:
##     l = -0.3015 
## 
##   sigma:  0.0075
## 
##       AIC      AICc       BIC 
## -1569.636 -1569.578 -1557.523
## Series: prop_PT.f_ontario.train 
## ARIMA(0,0,0) with non-zero mean 
## 
## Coefficients:
##         mean
##       0.6930
## s.e.  0.0008
## 
## sigma^2 = 0.0002581:  log likelihood = 1136.86
## AIC=-2269.73   AICc=-2269.7   BIC=-2261.65
## Series: prop_PT.f_ontario.train 
## Model:  NNAR(26,1,14)[12] 
## Call:   nnetar(y = prop_PT.f_ontario.train)
## 
## Average of 20 networks, each of which is
## a 26-14-1 network with 393 weights
## options were - linear output units 
## 
## sigma^2 estimated as 1.69e-07
##       [,1]            [,2]         [,3]       
##  [1,] "model"         "train MASE" "test MASE"
##  [2,] "naive"         "0.5325"     "1.6587"   
##  [3,] "randomwalk"    "0.4915"     "1.4284"   
##  [4,] "best ETS"      "0.5028"     "1.2835"   
##  [5,] "BoxCox ETS"    "0.5028"     "1.2836"   
##  [6,] "log-trans ETS" "0.5029"     "1.2828"   
##  [7,] "best ARIMA"    "1.0934"     "2.5262"   
##  [8,] "best neural"   "0.0241"     "0.6182"   
##  [9,] "bagged"        "0.4675"     "1.0246"

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(M,N,N)
## Q* = 60.342, df = 24, p-value = 5.715e-05
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 60.1, df = 24, p-value = 6.184e-05
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from ETS(A,N,N)
## Q* = 60.08, df = 24, p-value = 6.224e-05
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,0) with non-zero mean
## Q* = 4186.3, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from Naive method
## Q* = 88.938, df = 24, p-value = 2.16e-09
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from NNAR(26,1,14)[12]
## Q* = 17.676, df = 24, p-value = 0.8184
## 
## Model df: 0.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from baggedModel
## Q* = 132.73, df = 24, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 24

Reflection

As described on p. 17 of the final paper, the bagged model is preferred because bootstrap aggregation adapts to the data, offers greater flexibility, and avoids imposing a decreasing trend. Relative to the neural net, the bagged forecasts indicate that the series level—the proportion of Ontario part-time workers who are female—is unlikely to change substantially in the coming decade (≈26–27%). Given the high variability in the granular series (also evident in the attempted moving-average decomposition), bagging represents the most conservative choice.

To support transparency and replication for this Github project, the code was restructured and annotated so others can follow the workflow and reproduce the results. This process underscored the importance of clarity in both modeling and communication, ensuring that the conclusions are not only accurate but also accessible for others to test, adapt, and extend (see p. 18 for conclusions and future suggestions).